381 research outputs found

    Combining Multiple Views for Visual Speech Recognition

    Get PDF
    Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (3030^\circ) to up to 83% when combining this view with the frontal and 6060^\circ view angles

    Global behaviour of a composite stiffened panel in buckling. Part 2: Experimental investigation

    Get PDF
    The present study analyses an aircraft composite fuselage structure manufactured by the Liquid Resin Infusion (LRI) process and subjected to a compressive load. LRI is based on the moulding of high performance composite parts by infusing liquid resin on dry fibres instead of prepreg fabrics or Resin Transfer Moulding (RTM). Actual industrial projects face composite integrated structure issues as a number of structures (stiffeners, …) are more and more integrated onto the skins of aircraft fuselage. A post-buckling test of a composite fuselage representative panel is set up, from numerical results available in previous works. Two stereo Digital Image Correlation (DIC) systems are positioned on each side of the panel, that are aimed at correlating numerical and experimental out-of-plane displacements (corresponding to the skin local buckling displacements of the panel). First, the experimental approach and the test facility are presented. A post-mortem failure analysis is then performed with the help of Non-Destructive Techniques (NDT). X-ray Computed Tomography (CT) measurements and ultrasonic testing (US) techniques are able to explain the failure mechanisms that occured during this post-buckling test. Numerical results are validated by the experimental results

    Visual speech recognition:from traditional to deep learning frameworks

    Get PDF
    Speech is the most natural means of communication for humans. Therefore, since the beginning of computers it has been a goal to interact with machines via speech. While there have been gradual improvements in this field over the decades, and with recent drastic progress more and more commercial software is available that allow voice commands, there are still many ways in which it can be improved. One way to do this is with visual speech information, more specifically, the visible articulations of the mouth. Based on the information contained in these articulations, visual speech recognition (VSR) transcribes an utterance from a video sequence. It thus helps extend speech recognition from audio-only to other scenarios such as silent or whispered speech (e.g.\ in cybersecurity), mouthings in sign language, as an additional modality in noisy audio scenarios for audio-visual automatic speech recognition, to better understand speech production and disorders, or by itself for human machine interaction and as a transcription method. In this thesis, we present and compare different ways to build systems for VSR: We start with the traditional hidden Markov models that have been used in the field for decades, especially in combination with handcrafted features. These are compared to models taking into account recent developments in the fields of computer vision and speech recognition through deep learning. While their superior performance is confirmed, certain limitations with respect to computing power for these systems are also discussed. This thesis also addresses multi-view processing and fusion, which is an important topic for many current applications. This is due to the fact that a single camera view often cannot provide enough flexibility with speakers moving in front of the camera. Technology companies are willing to integrate more cameras into their products, such as cars and mobile devices, due to lower hardware cost for both cameras and processing units, as well as the availability of higher processing power and high performance algorithms. Multi-camera and multi-view solutions are thus becoming more common, which means that algorithms can benefit from taking these into account. In this work we propose several methods of fusing the views of multiple cameras to improve the overall results. We can show that both, relying on deep learning-based approaches for feature extraction and sequence modelling, as well as taking into account the complementary information contained in several views, improves performance considerably. To further improve the results, it would be necessary to move from data recorded in a lab environment, to multi-view data in realistic scenarios. Furthermore, the findings and models could be transferred to other domains such as audio-visual speech recognition or the study of speech production and disorders

    Global behaviour of a composite stiffened panel in buckling. Part 1: Numerical modelling

    Get PDF
    The present study analyses an aircraft composite fuselage structure manufactured by the Liquid Resin Infusion (LRI) process and subjected to a compressive load. LRI is based on the moulding of high performance composite parts by infusing liquid resin on dry fibres instead of prepreg fabrics or Resin Transfer Moulding (RTM). Actual industrial projects face composite integrated structure issues as a number of structures (stiffeners, …) are more and more integrated onto the skins of aircraft fuselage. A representative panel of a composite fuselage to be tested in buckling is studied numerically. This paper studies which of the real behaviours of the integrated structures are to be observed during this test. Numerical models are studied at a global scale of the composite stiffened panel. Linear and non linear analyses are conducted. The Tsai–Wu criterion with a progressive failure analysis is implemented, to describe the global behaviour of the panel up to collapse. Also, three stiffener connection methods are compared at the intersection between two types of integrated structures. Load shortening curves permit to estimate the expected load and displacements

    Paternal kin recognition in the high frequency / ultrasonic range in a solitary foraging mammal

    Get PDF
    Background Kin selection is a driving force in the evolution of mammalian social complexity. Recognition of paternal kin using vocalizations occurs in taxa with cohesive, complex social groups. This is the first investigation of paternal kin recognition via vocalizations in a small-brained, solitary foraging mammal, the grey mouse lemur (Microcebus murinus), a frequent model for ancestral primates. We analyzed the high frequency/ultrasonic male advertisement (courtship) call and alarm call. Results Multi-parametric analyses of the calls’ acoustic parameters and discriminant function analyses showed that advertisement calls, but not alarm calls, contain patrilineal signatures. Playback experiments controlling for familiarity showed that females paid more attention to advertisement calls from unrelated males than from their fathers. Reactions to alarm calls from unrelated males and fathers did not differ. Conclusions 1) Findings provide the first evidence of paternal kin recognition via vocalizations in a small-brained, solitarily foraging mammal. 2) High predation, small body size, and dispersed social systems may select for acoustic paternal kin recognition in the high frequency/ultrasonic ranges, thus limiting risks of inbreeding and eavesdropping by predators or conspecific competitors. 3) Paternal kin recognition via vocalizations in mammals is not dependent upon a large brain and high social complexity, but may already have been an integral part of the dispersed social networks from which more complex, kin-based sociality emerged

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    Neural crest requires Impdh 2 for development of the enteric nervous system, great vessels, and craniofacial skeleton

    Get PDF
    Mutations that impair the proliferation of enteric neural crest-derived cells (ENCDC) cause Hirschsprung disease, a potentially lethal birth defect where the enteric nervous system (ENS) is absent from distal bowel. Inosine 5′ monophosphate dehydrogenase (IMPDH) activity is essential for de novo GMP synthesis, and chemical inhibition of IMPDH induces Hirschsprung disease-like pathology in mouse models by reducing ENCDC proliferation. Two IMPDH isoforms are ubiquitously expressed in the embryo, but only IMPDH2 is required for life. To further understand the role of IMPDH2 in ENS and neural crest development, we characterized a conditional Impdh2 mutant mouse. Deletion of Impdh2 in the early neural crest using the Wnt1-Cre transgene produced defects in multiple neural crest derivatives including highly penetrant intestinal aganglionosis, agenesis of the craniofacial skeleton, and cardiac outflow tract and great vessel malformations. Analysis using a Rosa26 reporter mouse suggested that some or all of the remaining ENS in Impdh2 conditional-knockout animals was derived from cells that escaped Wnt1-Cre mediated DNA recombination. These data suggest that IMPDH2 mediated guanine nucleotide synthesis is essential for normal development of the ENS and other neural crest derivatives
    corecore